WordCount
Version 1.00
Current Release: Aug. 30, 2006
Intial Release: Aug. 29, 2006
by RedComet
copyright 2006
--------------------------------------

1. Introduction
2. Usage
3. Others
4. Updates

--------------------------------------

1. Introduction

--------------------------------------

WordCount is a utility that takes X number of files (according to the user's
specifications), a list of delimiters, and creates a file of every word in the
collective files and its frequency. Ideally, this will make deciding which words
to include in the dictionary for dictionary encoding much, much easier.

--------------------------------------

2. Usage

--------------------------------------

WordCount is a command line prompt that takes 5 paramaters: a filebase, the number of
files to scan, a string of delimiters, the file to output the results to, and the
minimum string length you want to be included in the search:

wordcount filebase no_files delimiters output minimum_lenghth

Let's go with an example. I have the following scripts: dbz4_00.txt ... dbz4_06.txt.
It's very important that the scripts you include be in the format filebase_XX.txt
The filebase is dbz4 and the number of files to include is 7.

The delimiters will be used to determine what constitutes a word. Since the only thing
I want to be included is a ' (for contractions and so on), I supply WordCount with this:
"# ;:_,-[]!?./". It is very important that the string of delimiters be enclosed in
quotations or the program won't run. Also, line breaks are automatically ignored.

The first character in the delimiters string will also be used to determine if a line
is to be scanned or not. For example, since the first character is #, the following line
will be ignored and none of the words included in the count:

#This line is ignored.

If you don't want to use this, just supply a character that isn't used.

Finally, if you don't want strings smaller than 2 characters long, for instance, you
would supply 2 as the minimum length. I'd recommend 3 if you're using this for a
dictionary compression scheme, but I suppose with a little crafty manipulation of the
delimiter string, WordCount would work for DTE, too.

So for the above examples, I supply the following command line:
wordcount dbz4 7 "# ;:_,-[]!?./\n" dbz4_count.txt 3

The frequency of the words (no smaller than 3 characters long) from most frequent to
least frequent will be recorded to dbz4_count.txt in two separate columns.

One other thing is that raw hex data in the romjuice <$XX> format is ignored by default.
There are a few problems with this, however. If you have the following word
"Twilight<$FF>Translations", only "Twilight" will be counted, while "Translations" won't
be (unless it appears separately elsewhere). I couldn't think of any simple way of
handling this, but if someone really needs it, I'd be happy to add it.

--------------------------------------

3. Other

--------------------------------------

This is very much a work in progress, if need be. If you have a feature you'd like to
see implemented, or find a bug, let me know via email at redcomet at rpgclassics dot com
or PM me at Romhacking.net.

--------------------------------------

4. Updates

--------------------------------------

-Version 1.01

 I fixed a few bugs so that WordCount automatically ignores line breaks. I doubt anyone
 was going to use them as delimiters, so no big loss. I also added the ability to
 specifiy a minimum length for the strings that are included in the search. I also added
 the framework for a future release in which WordCount will be able to insert the results
 directly into the rom and create a table, if so desired. Not sure how I'm gonna handle
 this, what with pointers being an issue. I'll think of something.

-Version 1.00

 Iniital release.